ALOJA: A Benchmarking and Predictive Platform for Big Data Performance Analysis

نویسندگان

  • Nicolás Poggi
  • Josep Lluis Berral
  • David Carrera
چکیده

The main goals of the ALOJA research project from BSCMSR, are to explore and automate the characterization of cost-effectiveness of Big Data deployments. The development of the project over its first year, has resulted in a open source benchmarking platform, an online public repository of results with over 42,000 Hadoop job runs, and webbased analytic tools to gather insights about system’s cost-performance. This article describes the evolution of the project’s focus and research lines from over a year of continuously benchmarking Hadoop under different configuration and deployments options, presents results, and discusses the motivation both technical and market-based of such changes. During this time, ALOJA’s target has evolved from a previous low-level profiling of Hadoop runtime, passing through extensive benchmarking and evaluation of a large body of results via aggregation, to currently leveraging Predictive Analytics (PA) techniques. Modeling benchmark executions allow us to estimate the results of new or untested configurations or hardware set-ups automatically, by learning techniques from past observations saving in benchmarking time and costs.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

ALOJA: A Framework for Benchmarking and Predictive Analytics in Big Data Deployments

This article presents the ALOJA project and its analytics tools, which leverages machine learning to interpret Big Data benchmark performance data and tuning. ALOJA is part of a long-term collaboration between BSC and Microsoft to automate the characterization of cost-effectiveness on Big Data deployments, currently focusing on Hadoop. Hadoop presents a complex run-time environment, where costs...

متن کامل

Evaluating the impact of SSDs and InfiniBand in Hadoop cluster performance and costs

T his report evaluates the impact in terms of performance and costs of introducing Solid State Drives (SSDs) and InfiniBand networking technologies, over a typical commodity Hadoop cluster setup for Big Data processing. For the evaluations, over 1,000 benchmark runs are executed to the same Hadoop cluster, but varying the software (SW) configuration to find the best performance of a particular ...

متن کامل

A Fuzzy TOPSIS Approach for Big Data Analytics Platform Selection

Big data sizes are constantly increasing. Big data analytics is where advanced analytic techniques are applied on big data sets. Analytics based on large data samples reveals and leverages business change. The popularity of big data analytics platforms, which are often available as open-source, has not remained unnoticed by big companies. Google uses MapReduce for PageRank and inverted indexes....

متن کامل

Applying Network Data Envelopment Analysis to Determine a Criterion for Benchmarking in Regional Electricity Companies of Iran.

One of the effective methods for improving the efficiency of an organization is benchmarking against successful organizations. Not only benchmarking could be a technique for identifying problems but also it greatly helps managers in relations of the design of processes. Among strategic and infrastructure industries in each country, the electricity industry is one of the most important and criti...

متن کامل

Big Data Analytics and Now-casting: A Comprehensive Model for Eventuality of Forecasting and Predictive Policies of Policy-making Institutions

The ability of now-casting and eventuality is the most crucial and vital achievement of big data analytics in the area of policy-making. To recognize the trends and to render a real image of the current condition and alarming immediate indicators, the significance and the specific positions of big data in policy-making are undeniable. Moreover, the requirement for policy-making institutions to ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015